The Φ Accrual Failure Detector
نویسندگان
چکیده
Detecting failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far. One of the reasons is the difficulty to satisfy several application requirements simultaneously when using classical failure detectors. We present a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems. Instead of providing information of a boolean nature (trust vs. suspect), accrual failure detectors output a suspicion level on a continuous scale. The principal merit of this approach is that it favors a nearly complete decoupling between application requirements and the monitoring of the environment. In this paper, we describe an implementation of such an accrual failure detector, that we call the φ failure detector. The particularity of the φ failure detector is that it dynamically adjusts to current network conditions the scale on which the suspicion level is expressed. We analyzed the behavior of our φ failure detector over an intercontinental communication link during several days. Our experimental results show that our φ failure detector performs equally well as other known adaptive failure detection mechanisms, with an improved flexibility.
منابع مشابه
A Weibull distribution accrual failure detector for cloud computing
Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve ...
متن کاملDefinition and properties of accrual failure detectors : an overview
Ensuring fast and accurate failure detection is a fundamental issue for building efficient fault-tolerant distributed systems. In an effort to make fault-tolerant applications easier to implement, we are trying to provide failure detection as a generic Internet service, similar to what was done very successfully with NTP (network time protocol) for clock synchronization. To do so, we must revis...
متن کاملLA - FD : a Low - overhead Accrual Failure Detector ?
Failure detector is one of the fundamental components for building a distributed system with high availability. In order to maintain the efficiency and scalability of failure detection in a complicate largescale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. In this paper, an accrual failure detector — LA-FD with low system o...
متن کاملLow-Overhead Accrual Failure Detector
Failure detectors are one of the fundamental components for building a distributed system with high availability. In order to maintain the efficiency and scalability of failure detection in a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. In this paper, an new accrual failure detector--LA-FD with low s...
متن کاملVariations and Evaluations of an Adaptive Accrual Failure Detector to Enable Self-healing Properties in Distributed Systems
The initiatives Organic Computing and Autonomic Computing introduced challenging visions for future computer systems. They address the growing complexity of these systems that demands for new ways to control them. Future systems should be able to adapt dynamically to the current conditions of their environment. They should be characterised by so-called self-x properties like self-configuring, s...
متن کامل